Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove PDFPageView.updatePosition since it's not actually necessary #8531

Merged
merged 1 commit into from
Jun 15, 2017
Merged

Remove PDFPageView.updatePosition since it's not actually necessary #8531

merged 1 commit into from
Jun 15, 2017

Conversation

Snuffleupagus
Copy link
Collaborator

This method is currently called from PDFViewer._scrollUpdate on every scroll event in the viewer.

However, I cannot see why this code is now necessary (assuming that it once was), since text-selection and searching still works exactly the same way with this patch as with the current master.

When PDFPageView.updatePosition is called, the page can be in either of these states:

  1. The page hasn't been rendered, in which case the textLayer property doesn't exist yet.
  2. The page is currently rendering, meaning that the textLayer property exists. Given that the textContent won't be fetched until the page has been successfully rendered, TextLayerBuilder.render will return immediately and effectively be a no-op (since there's nothing to render yet).
  3. The has been been rendered, and the textLayer is currently rendering.
  4. The page, and its textLayer, has been completely rendered. In this case, TextLayerBuilder.render will return immediately and effectively be a no-op.

Here, only the third case seem to require any further analysis:
When scrolling occurs while the textLayer is rendering, TextLayerBuilder.render will via a helper method call TextLayerRenderTask.cancel (in src/display/text_layer.js) to stop processing.
However, due to the run-to-completion nature of JavaScript, once TextLayerRenderTask._render has been invoked appendText will always run.[1]

So even though we cancel rendering of pending textLayers during scrolling, via the repeated TextLayerBuilder.render calls from within the PDFPageView.updatePosition method, that does not prevent us from running the code inside of TextLayerRenderTask._render over and over for the same page; which all seems very inefficient to me.[2]

All this will thus have the effect of delaying the actual rendering of a textLayer ever so slightly while scrolling in the viewer. However, it does so at the expense of potentially hundreds of unnecessary appendText calls.[3]

Hence it seems to me that it's less resource intensive overall to simply let rendering of the textLayer complete, once it has started. Obviously, we still abort all rendering of a page, and its textLayer, when it's being destroyed (e.g. by being evicted from the page cache).


In case that there's any worry that the patch could affect e.g. highlighting of search results, please note that the existing code in TextLayerBuilder.render already calls updateMatches when the TextLayerTask resolves successfully.

I'm sorry that this became quite long, but to try and summarize:
PDFPageView.updatePosition doesn't actually do anything in most cases. In the one case where it matters, it seems that it's actually doing more harm than good; which is why I'm proposing that we just remove it.


[1] Although we may be able to skip the render call, provided that it happens after a timeout (as is the case in the default viewer).
[2] With current work being done to support streaming of TextContent, we'd also need to add just as many duplicate API calls to PDFPageView.updatePosition.
[3] The number of duplicate appendText calls is directly proportional not only to the scroll speed, but also to the number of pages in the document.

This method is currently called from `PDFViewer._scrollUpdate` on *every* scroll event in the viewer.

However, I cannot see why this code is now necessary (assuming that it once was), since text-selection and searching still works *exactly* the same way with this patch as with the current `master`.

When `PDFPageView.updatePosition` is called, the page can be in either of these states:
 1. The page hasn't been rendered, in which case the `textLayer` property doesn't exist yet.
 2. The page is currently rendering, meaning that the `textLayer` property exists. Given that the `textContent` won't be fetched until the page has been successfully rendered, `TextLayerBuilder.render` will return immediately and effectively be a no-op (since there's nothing to render yet).
 3. The has been been rendered, and the `textLayer` is currently rendering.
 4. The page, and its `textLayer`, has been completely rendered. In this case, `TextLayerBuilder.render` will return immediately and effectively be a no-op.

Here, only the *third* case seem to require any further analysis:
When scrolling occurs while the `textLayer` is rendering, `TextLayerBuilder.render` will via a helper method call `TextLayerRenderTask.cancel` (in src/display/text_layer.js) to stop processing.
However, due to the run-to-completion nature of JavaScript, once `TextLayerRenderTask._render` has been invoked `appendText` will always run.[1]

So even though we cancel rendering of pending `textLayer`s during scrolling, via the repeated `TextLayerBuilder.render` calls from within the `PDFPageView.updatePosition` method, that does *not* prevent us from running the code inside of `TextLayerRenderTask._render` over and over for the *same* page; which all seems *very* inefficient to me.[2]

All this will thus have the effect of delaying the *actual* rendering of a `textLayer` ever so slightly while scrolling in the viewer. However, it does so at the expense of potentially hundreds of unnecessary `appendText` calls.[3]

Hence it seems to me that it's less resource intensive overall to simply let rendering of the `textLayer` complete, once it has started. Obviously, we still abort all rendering of a page, and its `textLayer`, when it's being destroyed (e.g. by being evicted from the page cache).

In case that there's any worry that the patch could affect e.g. highlighting of search results, please note that the existing code in `TextLayerBuilder.render` already calls `updateMatches` when the `TextLayerTask` resolves successfully.

*I'm sorry that this became quite long, but to try and summarize:*
`PDFPageView.updatePosition` doesn't actually do anything in *most* cases. In the one case where it matters, it seems that it's actually doing more harm than good; which is why I'm proposing that we just remove it.

---
[1] Although we may be able to skip the `render` call, provided that it happens *after* a `timeout` (as is the case in the default viewer).
[2] With current work being done to support streaming of `TextContent`, we'd also need to add just as many duplicate API calls to `PDFPageView.updatePosition`.
[3] The number of duplicate `appendText` calls is directly proportional not only to the scroll speed, but also to the number of pages in the document.
@yurydelendik
Copy link
Contributor

/botio-windows preview

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_preview from @yurydelendik received. Current queue size: 0

Live output at: http://54.215.176.217:8877/4ae586d55906d3b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/4ae586d55906d3b/output.txt

Total script time: 19.48 mins

Published

@yurydelendik
Copy link
Contributor

I was looking to the old code (fcdf266) and I think this is the right move now. It was a problem back then due to non-optimized text layer building and lots of divs.

If we will feel we regress div heavy text layers, we will add "reset timeout" functionality to match the behavior of the original 5-year old commit.

@yurydelendik yurydelendik merged commit 2097513 into mozilla:master Jun 15, 2017
@yurydelendik
Copy link
Contributor

Thank you for the patch.

@yurydelendik yurydelendik mentioned this pull request Jun 15, 2017
4 tasks
@Snuffleupagus Snuffleupagus deleted the rm-updatePosition branch June 16, 2017 13:55
movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
Remove `PDFPageView.updatePosition` since it's not actually necessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants